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l/-\ I Abstract. We introduce a version of Stein's method for proving con- 

centration and moment inequalities in problems with dependence. Sim- 
ple illustrative examples from combinatorics, physics, and mathematical 
statistics are provided. 
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1. Introduction and results 



j^ , Stein's method was introduced by Charles Stein [So] in the context of nor- 

mal approximation for sums of dependent random variables. Stein's version 
of his method, best known as the "method of exchangeable pairs" , attained 
maturity in his later work [HUj. A reasonably large literature has developed 
^ ■ around the subject, but it has almost exclusively developed as a method of 

CN . proving distributional convergence with error bounds. Stein's attempts at 

getting large deviations in [HHI did not, unfortunately, prove fruitful. Some 
—1. ■ progress for sums of dependent random variables was made by Raic j33j . 

(^ I A general version of Stein's method for concentration inequalities was intro- 

^«0 ■ duced for the first time in the Ph.D. thesis ^l^i of the present author. The 

purpose of this paper is to explain the theory developed in ^J| via examples. 
1 -Q I Another application is in 12 . 

C^ ' This section is organized as follows: First, we give three examples, fol- 

lowed by the main abstract theorem; finally, towards the end of the section, 
we present very condensed overviews of Stein's method, concentration of 
measure, and the related literature. Proofs are in section [21 



> 

X. 

H , 1.1. A generalized matching problem. Let {aij} be an n x n array of 

real numbers. Let vr be chosen uniformly at random from the set of all 
permutations of {1, . . . ,n}, and let X = Y17=i (^inii)- This class of random 
variables was first studied by Hoeffding [23], who proved that they are ap- 
proximately normally distributed under certain conditions. It is easy to see 
that various well-studied functions of random permutations, like the num- 
ber of fixed points, the sum of a random sample picked without replacement 
from a finite population, and the function Y2i N~^(OI (known as Spearman's 
footrule ^ni)) are all instances of Hoeffding's statistic. 
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Hoeffding's statistic has a long history of association with Stein's method. 
In fact, in an unpubhshed work Stein introduced his method to treat the 
normal approximation problem for this object. Bolthausen j3| used Stein's 
method to give a Berry-Esseen bound. Bolthausen and Gotze [H| gave mul- 
tivariate central limit theorems under a further generalized setup. However, 
we have not seen large deviations or concentration bounds using any method. 

Our version of Stein's method enables us to easily derive the following 
nice tail bound. 

Proposition 1.1. Let {aij}i<ij<n be a collection of numbers from [0,1]. 
Let X = X^iLi Oj7r(j); where it is drawn from the uniform distribution over 
the set of all permutations of {1, ... ,n}. Then 

t^ 
X-E(X)| > t} < 2exp( 



4E(X) + 2t 
for any t > 0. 

Note that the bound does not have an explicit dependence on n. Note 
also the automatic transition from Poissonian to gaussian tails as E(X) 
becomes large (when E(X) is small the bound is like exp(— Ci), whereas 
when E(X) is large, it is essentially a gaussian tail with standard deviation 
Y^E(X).). These two properties characterize it as a so-called "Bernstein 
type inequality", named after the classical Bernstein inequality (see j37j . 
page 855) for sums of bounded independent random variables. 

The classical result of Maurey p^ can only imply the weaker inequality 
P{X > K{X) -|- t) < e~* Z^". However, it is possible to derive a Bernstein 
bound similar to Proposition 11.11 (albeit with a significantly worse constant 
in the exponent) using Michel Talagrand's deep theorem about concentra- 
tion of random permutations (Theorem 5.1 in Section 5 of J^; see also 
McDiarmid 31^ and Luczak & McDiarmid j29j). 

For a concrete application, let X be the number of fixed points of a 
random permutation vr. Then X = Y17=i^i-^{i)' where aij = I{i=j}. Since 
E(X) = 1, Proposition O gives F{\X - 1| > t} < 2exp(-tV(4 + 2t)). Of 
course, we do not expect this to be the best possible bound in this very well- 
understood problem; this is just meant to be an illustration. In fact, the 
exact distribution of the the number of fixed points is known (see Feller ^2l , 
section IV. 4), which gives a tail bound like exp(— Ctlogt). 

Finally, we also have a "Burkholder-Davis-Gundy" type inequality for 
Hoeffding's statistic which does not require a bound on the aj^'s. 

Proposition 1.2. Let {aij}i<ij<n be an arbitrary collection of real num- 
bers. Let TV be a uniform random permutation, and let X = ^"^x ^iTrU) ■ 
Define 

Then for every positive integer k, we have E(X — E(X))^'^ < {2k — 1)'^EA''. 
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For a general exposition about the famous Burkholder-Davis-Gundy mar- 
tingale inequalities we refer to the article by Burkholder |inj . 

1.2. Magnetization in the Curie- Weiss model. Fix any /3 > 0, /i G M, 
and consider the probability mass function (the Gibbs measure) on { — 1, l}" 
given by 

(1) P(M):=Z-iexp('^^a,a,+/3/i^a.Y 

^ i<j i ^ 

where a = (o"i, . . . , (t„) is a typical element of { — 1, 1}" and Z is the nor- 
malizing constant (depends on /? and K). This is known as the 'Curie- Weiss 
model of ferromagnetic interaction' at inverse temperature /? and external 
field h. The ctj's stand for the spins of n particles, each having a spin of +1 
or —1. The ferromagnetic interaction between the particles is captured in a 
very simplistic manner by the first term in the hamiltonian. 

The magnetization of the system, as a function of the configuration a, is 
defined as m(o") := - X^ILi ^i- ^^ ^ ^^ large and a is drawn from the Gibbs 
measure, then the magnetization satisfies 

(2) m{a) w tanh(/3m((T) + /3/i). 

with high probability. The equation has a unique root for small values of /? 
and multiple solutions for f3 above a critical value. In the physics parlance, 
this is described by saying that the Curie- Weiss model exhibits "spontaneous 
magnetization" at low temperatures. For a formal discussion with rigorous 
proofs, we refer to Ellis ^H]) section IV. 4. 

The following proposition formalizes Q with finite sample tail bounds. 

Proposition 1.3. Suppose a is drawn from the Gibbs measure (^. Then, 
for any /3 > 0, /i G M, n > 1, and t > 0, the magnetization m := - "^iCTi 
satisfies 

B t ^ ( t^ 

m - tanh(/3m -F /3/i) >- ^—j=\ < 2exp 



n V^j -— ^^ 4(1 + /3) 

Although the Curie- Weiss model is a simple model of ferromagnetic inter- 
action, we haven't encountered any result in the literature which gives an 
explicit bound like the above. In particular, the result shows concentration 
of m(o") around the set of roots of x = tanh(/3x + /3/i), and not just its mean. 
However, concentration inequalities for Gibbs measures without explicit 
constants under various mixing conditions have been obtained before. For 
a history of the literature and some significant recent progress, we refer to 
Chazottes et. al. [Tl] . 

1.3. Least squares estimation in the Ising model. The Ising model 
is another model of ferromagnetic interaction. Given an undirected graph 
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G = (V,E) on the vertex set V = {l,...,n}, the Ising model without 
external field assigns the following probability density on {—1, l}*^ 



in. 



(3) F{{a}) = Z(J3)-'exf(l3 






Here, as before, (3 is the inverse temperature and Z(f3) is the normalizing 
constant. A natural statistical problem in this model is the following: How 
to make inference about /3 when your data is a single configuration generated 
from the Gibbs measure? 

The classical maximum likelihood approach for this problem was first 
considered by Pickard |S2]- Iterative methods for computing the maximum 
likelihood estimator (e.g. Geyer & Thompson [221, Jerrum & Sinclair j26j ) 
are widely used nowadays. The Jerrum-Sinclair algorithm for computing the 
normalizing constant in the Ising model provably converges in polynomial 
time. However, it is not so clear whether the MLE is a good estimator at 
all, particularly at critical temperatures. 

Here we investigate a method of estimating f3 by minimizing an explicit 
sum-of-squares. First, let a be drawn from the Gibbs measure on 
{—1, l}*^, and for each i, let 

rui := Yl ^i- 

J-{i,jKE 

For each u > 0, let 

1 " 
(4) S(u) := -'y'icri- tanh(nmi))^. 

n ^-^ 

The 'least-squares estimate' of /3 is defined to be 

^LS ■= argmin„>o5('u). 

Note that it is practically very easy to compute (3lsj because 5 is a smooth 
function of a single variable. 

The least-squares technique is well-known and commonly used in the anal- 
ysis of gaussian Markov random field (GMRF) models (probably originating 
from Besag 0), but rigorous results are scarce. 

Proposition 11.41 f stated below) shows that the random function S indeed 
attains an approximate global minimum near /?. In fact, it gives 



E\S{I3)- mm S{u)\ = 0' /''^°S™ 



it>o V V n 



where r is the maximum degree of the dependency graph G (recall that 
the degree of a vertex is the number of neighbors of that vertex, and the 
maximum degree of a graph is the maximum vertex degree) . 
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Proposition 1.4. Let r he the maximum degree of the dependency graph G 
in the Ising model Q; o^nd let S{u) be defined as in Q. Take any t > 
and let 

r{logn + t) 

V n 

Then we have 

P{5(/3) > min S{u) + Ce} < exp{-Kt^), 

u>0 

where C and K are numerical constants. 

Although it is unclear whether Proposition 11.41 is useful from a statistical 
point of view, it seems to be interesting as a mathematical result. For 
instance, observe that the conclusion is valid at any temperature. This is 
quite remarkable, since the low temperature phase in the Ising model is 
notoriously intractable for most graphs. 

Here we should also mention that the technique can be easily applied to 
the Ising model with an external field, but we prefer to restrict ourselves to 
the problem of estimating a single parameter (the temperature) for the sake 
of clarity. 

1.4. The abstract result. The following theorem encapsulates the con- 
centration and moment inequalities used to work out all the examples in 
this paper. 

Theorem 1.5. Let X be a separable metric space and suppose {X,X') is 
an exchangeable pair of X-valued random variables. Suppose / : X ^- M and 
F : XxX ^ M. are square-integrable functions such that F is antisymmetric 
(i.e. F{X,X') = -F{X',X) a.s.), andE{F{X,X') \ X) = f{X) a.s. Let 

A{X) := \E{\{f{X) - fiX'))F{X,X')\ \ X). 

Then E(/(X)) = 0, and the following concentration results hold for f{X): 
ii) Ifn^{X)) < oo, then Var(/(X)) = ^E{{f{X) - f{X'))F{X,X')). 
{a) Assume that ¥j{e^f^^'\F{X,X')\) < oo for all 9. If there exists non- 
negative constants B and C such that A(X) < Bf{X) + C almost 
surely, then for any t > 0, 

P{/(X) > t} < exp(-^^^^) and P{/(X) < -t] < exp(-^). 

{Hi) For any positive integer k, we have the following exchangeable pairs 
version of the Burkholder- Davis- Gundy inequality: 

E{f{Xf'') < {2k - lfE{A{Xf). 

To see how the exchangeable pairs are constructed and the theorem is ap- 
plied in our examples, one has to look at the proofs in section |2j However, 
for a quick illustration, we will now work out the inequalities for sums of 
independent random variables, taking care to spell out details. 



6 SOURAV CHATTERJEE 

1.5. Simplest example. Let X = Yl!i=i^i^ where 1^'s are independent 
square integrable random variables. Let /ij = ]E(1^) and af = Var(yi). An 
exchangeable pair is created by choosing a coordinate / uniformly at random 
from {1, . . . , n}, and defining 

where Y(, . . . ,Y^ are independent copies of Yi, . . . , y„. Let 

F{x,y) = n{x-y). 
Then 

1 " 

E(F(X,X') I Yi, . . . ,y„) = - ^E(n(y, - F/) | Yi, . . . , y„) 

n 

1=1 
Since the right hand side depends only on X, we have 

f{X) = E(F(X,X') \X)=X - E(X). 

Thus, from part (i) of Theorem 11.51 we get the elementary identity 



Now note that 



2 



A(X) = -E((X - X'f I X) 



1 " 

-Y,mY.-Yif\x). 



2 



If ci, . . . , Cn are constants such that |yi — /ij| < q a.s. for each z, then 

E((y, - y/)2 1 X) = n{Yi - fii)^ I X) + E((y/ - /i,)') 



<c2 + a2. 



Part (ii) of Theorem 11.51 now implies that 
P{|X-E(X)| >t}< 2exp 



t2 



ELi(c? + -f) 



This is similar to (but not exactly the same as) the classical Hoeffding in- 
equality |25j for sums of bounded random variables. 

Now suppose that < 1^ < 1 a.s. for each i. If the ^j's are very small, 
then the Hoeffding bound is wasteful. A more careful analysis gives a better 
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result, as follows. First, note that 
1 " 

^{x) = -Y,my^-yl?\x) 

i=l 
1 " 

= -Y^m^ - 2M^E(y, I X) +E(y,2 1 X)). 

i=l 

Using the assumption that < Yj < 1, we get 

1 " 1 1 

A(X) < - j;(E(y,) + E(y, I X)) = -(E(X) +X) = -fix) + E{X). 

i=l 

Thus, we can take B = 1/2 and C = E(X) in part {ii) of Theorem 11.51 
which gives 

F{|X-E(X)|>()<2exp(-^^|i^). 

Again, this is a version of the classical Bernstein inequality (see |37j . page 
855) for sums of independent random variables. 

Finally observe that by part {iii) of Theorem 11.51 and an application of 
Jensen's inequality, we have for each positive integer k, 

/-, n .k 

EiX^'^) < {2k - l)''E^-Y,my^ - ^^^f + {Yl - fi^f I X)j 



<(2fc-i)^E(^(y,-^,)M 



This is exactly what the Burkholder-Davis-Gundy inequality jlUj would give 
us for sums of independent random variables (although in this case, it can 
be derived by easier methods). 

In the remainder of this section, we give very short overviews of Stein's 
method and concentration of measure. 

1.6. Stein's method. Suppose we want to show that a random variable 
X taking value in some space X has approximately the same distribution 
as some other random variable Z. The classical version of Stein's method 
PSI EHI involves four steps: 

(1) Identify a "characterizing operator" T for Z, which has the defining 
property that for any function g belonging to a fixed large class of 
functions, ETg(Z) = 0. For instance, if X = M and Z is a standard 
gaussian random variable, then Tg{x) := g'{x) — xg{x) is a charac- 
terizing operator, acting on all locally absolutely continuous g with 
sub exponential growth at infinity. 

(2) Construct a random variable X' such that (X, X') is an exchange- 
able pair. 
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(3) Find an operator a such that for any suitable /i : X ^ M, ah is an 
antisymmetric function (i.e. ah{x,y) = —ah{y,x)) and 

\E{ah{X,X')\X = x)- Th{x)\ < en, 

where Sh is a smah error depending only on h. 

(4) Take a function g and find h such that Th{x) = g(x) — E.g{Z). By 
antisymmetry of ah and the exchangeability of {X,X'), it follows 
that E,{ah{X,X')) = 0. Combining with the previous step, we have 
the error bound \Eg{X) - Eg{Z)\ < eh- 

There are other variants of Stein's method, most notably the generator 
method of Andrew Barbour |5, the dependency graph approach introduced 
by Chen ^Hl and Baldi and Rinott [2j and popularized by Arratia, Gold- 
stein and Gordon |2j, the size-biased coupling method of Barbour, Hoist 
and Janson [S], and the zero-biased coupling method due to Goldstein and 
Reinert [2S1- The recent applications to algebraic problems by Jason Ful- 
man |2()[ I21j . and the quest for Berry-Esseen bounds by Rinott and Rotar 
pi| and Shao and Su ^3 are also worthy of note. 

However, it is not our purpose here to go deeply into the regular versions 
of Stein's method. For further references and exposition, we refer to the 
recent monograph J21- For applications of the method of exchangeable 
pairs and other versions of Stein's method to Poisson approximation, one 
can look at the survey paper by Chatterjee, Diaconis & Meckes |13j . 

1.7. Concentration inequalities. The theory of concentration inequal- 
ities tries to answer the following question: Given a random variable X 
taking value in some measure space X (which is usually some high dimen- 
sional Euclidean space), and a measurable map / : X ^ M, what is a good 
explicit bound on P{|/(X) — E/(X)| > x}? Exact evaluation or accurate 
approximation is, of course, the central purpose of probability theory itself. 
In situations where this is not possible, concentration inequalities aim to do 
the next best job by providing rapidly decaying tail bounds. 

The literature on concentration inequalities is huge — from the pioneering 
inequalities of Hoeffding [221 to the momentous work of Talagrand [UII — but 
most of it revolves around well-behaved functions of independent random 
variables. For a nearly complete account of the literature until the year 
2001, we redirect the reader to the definitive resource in this subject — the 
monograph [JHj by Michel Ledoux. The methods of Kim and Vu |22] and 
Boucheron, Lugosi, and Massart [Hj are significant recent developments. 

The techniques developed in ^j (and partially presented here) have some 
basic similarities with the concentration results of Schmuckenschlager ^3QJ, 
but go much beyond that in terms of applications. Other than that (and 
log-Sobolev inequalities, which are much harder to obtain anyway) there is 
very little — even in the vast concentration literature — about the con- 
centration of functions of dependent random variables, particularly in the 
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discrete setting. We hope that our version of Stein's method wiU partially 
fill this void. 

Acknowledgments. I am grateful to Persi Diaconis and Yuval Peres for 
many useful comments and suggestions. Thanks are also due to the two 
anonymous referees for pointing out several omissions and errors. 

2. Proofs 

Before proving Theorem 11.51 let us see how it is applied to work out the 
three examples described in section ^ 

Proof of Proposition [TTTl Construct X' as follows: Choose I, J uniformly 
and independently at random from {1, . . . ,n}. Let vr' = vr o (/, J), where 
(/, J) denotes the transposition of / and J. It can be easily verified that 
(vr, vr') is an exchangeable pair. Hence if we let 

n 

X := / ^O'iTT'ii)^ 

i=l 

then {X^X') is also an exchangeable pair. Now note that 

1 Tl 

-E(ra(X - X')\tt) = -E(a/^(/) + aj^(^j) - aj^t^j) - aj^(/)|vr) 

= X-W.{X). 

Thus, we can take f{x) = x — K{X) and F{x,y) = i^'nix — y). Now note 
that since < aij < 1 for all i and j, we have 

1 77 

-E(|(/(X) - f{X'))F{X,X')\ I vr) = -E((X - X')V) 



~ 4^ Z-,'''^*'^(*) ~'~ '^■?''^(j') "'■'f(i) '^JTr(i)) 



2n 

= X + E(X) = /(X) + 2E(X). 

Since the last quantity depends only on X it follows that A(X) = /(X) + 
2E(X). Applying part (u) of Theorem O with 5 = 1 and C = 2E(A) 
completes the proof. D 

Proof of Proposition [TT^ Follows directly from part {iii) of Theorem ll.51 
and the computations done in the proof of Proposition ll.il □ 
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Proof of Proposition II. 3L Suppose a is drawn from the Gibbs distri- 
bution. We construct a' by taking a step in the Gibbs sampler as follows: 
Choose a coordinate / uniformly at random, and replace the I coordinate 
of a by an element drawn from the conditional distribution of the /'' coor- 
dinate given the rest. It is well-known and easy to prove that (a, a') is an 
exchangeable pair. Let 

n 
i=l 

Now define 

mi{a) := - V] aj, i = l,...,n. 

Since the Hamiltonian is a simple explicit function, the conditional dis- 
tribution of the i coordinate given the rest is easy to obtain. An easy 
computation gives E((Tj|{(Tj, j 7^ i}) = tanh(/?mj -|- /?/i). Thus, we have 

1 " 
f{a) = E{F{a,a')\a) = - ^^(a, - E(ai|{a,-, j / i})) 

■1 

1 " 

— > tanh(/??7Zj -|- (3h). 



n 



i=\ 



m 



n 



i=\ 



Now note that |-F(cr, (t')| < 2, because a and a' differ at only one coordinate. 
Also, since the map x 1— > tanhx is 1-Lipschitz, we have 

n ^-^ n 

Thus, by part {ii) of Theorem 11.51 we have 

1 " t 

I 7 tanh(/3mj -|- /3/i) 



%=\ 



> 



< 2 exp , 

n\ - ^V 4(1 + /3) 



Finally note that for each i, by the Lipschitz nature of the tanh function, 
we get 

1 "^ 

- y tanh(/3mj -|- /3/i) — tanh(/3?7z -|- /3/i) 
n '^^ 

i=l 
1 " 

El tanh(/3?7Zj -|- /3/i) — tanh(/??7z -|- /?/i)| 



< 



< 



i=l 

n 






ra\ < 



i=l 



a 



n 



This completes the proof. 



D 
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Proof of Proposition II. 4L As in the proof of Proposition II .^-JL we produce 
a' by taking a step in the Gibbs sampler: A coordinate / is chosen uniformly 
at random, and ai is replace by a'^ drawn from the conditional distribution 
of the J*^ coordinate given {aj)j^j. For each i, let 

mi = mi{a) := ^ Uj. 

Now fix n > and define 

F(a,a') := (o"/ — a'j){tanh{Pmi) — tanh{umi)). 
Then F[a,a') = —F{a',a) because mj[a) = 7nj{a') . Now let 
f{a):=E{F{a,a')\a) 

1 "■ 
= — > (o"i — tanh(/3?7ij))(tanh(/?mj) — tanh(u?Ti,j)). 
n ^-^ 

i=l 

Now, if r is the maximum degree of G, then at most r + 1 terms in the sums 
defining f{cr) and /(o"') are unequal, and they all lie in the interval [—4,4]. 
Thus, \f{a) - /(o-')l < 8(r + l)/n. Also, evidently, \F{a,a')\ < 4. Using all 
this information in part (ii) of Theorem ll.5( we get 

P{/(^)<-t}<exp( ''*' 



32(r + 1) 
Now, a direct verification shows that 

1 " 
S(u) - S(f3) = - y(tanh/3mi) - tanh(umi)f + 2f(a). 
n ^-^ 
1=1 

Thus, 

(5) r{S{l3) > S{u) +t}< P{2/(ct) < -t} < exp 



nt"^ 



128(r + l), 
Now note that for any u,v >0, we have 

\S{u)-S{v)\ 

1 " 
< — > 1(2(7, — tanh(um,j) — tanh(?;mj))(tanh(fr?T,j) — tanh(u77ij))| 



n 
j=i 



4 ^-^ 

< — > I tanh(w?7ij) — tanh(nmj)| < 4r|n — v\, 

i=l 

since \mi{u — v)\ < r\u — v\. Let N = \\/nr log n\ , and let 



u^, = k\r^^ for A; = 1,2,...,A^. 
nr 
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Then, if Uk-i < u < Uk, the above inequality gives 



r lo£?T. 
\S{u)-S{uk)\<4r\u-Uk\<4y' ^ 



n 



Now take any u > u^. Since rrii £ {0, ±1, . . . , =br}, therefore | tanh(nr?T,j) 
tanh{u iymi)\ < 1 — tanh(nAr|mj|) < 1 — tanh(nAr). Thus, 



4 " 



^ V — ^ 

\S(u) — S(un)\ < — / I tanh(nm,j) — tanh(nAr?TT,j)| 



n 



< 4(1 - tanh(u7v)) < 4e-"^ < — . 

n 



If n > 3, then y^log n/n > e/n. Combining the steps, we see that for n > 3, 



min S{uk) < niinS'(n) + 4 



r logn 



i<A,<Af M>o V n 

Finally, combining this with 0, we get 



PJ 5(/3) > min5('u) + 41/^^^^^ + t 
[ u>o V n 

<P|5(/?) > min S(uk) + t] 

*■ l<k<N ' 



N 
< 



nt^ 



^P{5(/3) > S{u,)+t} < TVexpj^- ^28(, + i) 



It is now easy to complete the proof by substituting the value of A^ and 
choosing t > \JCr log njn for sufficiently large C, so that the effect of N 
washes out. D 

Finally, let us prove our main result. 

Proof of Theorem 11.51 Let us begin with a useful general identity. Sup- 
pose /i : X ^ M is any measurable map such that E|/i(X)F(X, X')| < co. 
Then clearly E(/i(X)/(X)) = E(/i(X)F(X,X')). Using the exchangeability 
of X and X', and the antisymmetric nature of F, we have 

E(/i(X)F(X,X')) =E(/i(X')F(X',X)) = -E(/i(X')F(X,X')). 

Thus, we have 

(6) E(/i(X)/(X)) = E(/i(X)F(X, X')) = ^E((/i(X) - h{X'))F{X, X')). 

The above equation is the basis of all that follows. First, note that by 
putting /i = 1, we immediately get E(/(X)) = 0, Similarly, part (i) of the 
Theorem follows by putting h = f. Next, let us start proving (ii). Let 
m{6) := K{e •'^ ') be the moment generating function of f{X). We can dif- 
ferentiate m{9) and move the derivative inside the expectation because of the 
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assumption that E(e^-^(^)|F(X,X')|) < oo for all 9. Thus, by equation ©, 
we have 

m'{e) = E(e^/(^)/(X)) = iE((e^-^(^) - e^^(^'))F(X,X')). 

Now note that for any x, y G M, 



(7) 



e — e 



x-y 



1 





< [ {te^ + (1 - t)ey)dt = -(e^ + e^ 
Jo 2 



'0 

Using this inequality, and the exchangeability of X and X', we get 

\m'{e)\ < ^E((e^/W +e^^(^'))|(/(X) -/(X'))F(X,X')|) 

= ME(e^/WA(X) + e^^(^')A(X')) 
= |0|E(e^^WA(X)) 

< |0|E(e''^W(^/W + <^)) = B\e\m'{9) + C\e\m{e). 

Since m is a convex function and m'{0) = E(/(X)) = 0, therefore m'{9) 
always has the same sign as 6. Thus, for < 9 < 1/B, the above inequality 
translates into 

— logm p < . 

de ^ ^ ' - 1-B9 

Using this and recalling that ?7i(0) = 1, we have 

Cu , C9^ 



f Cu , 

logmft^j < / du < 

Jo i- Bu 



2(1 - B9) 
Putting e = t/{C + Bt), we get 

F{/(X) > t} < exp(-0t + logm(0)) < e-*'/(2C+2Bt)^ 

The lower tail can be done similarly; note that for < 0, we have m'{9) < 0, 
and hence 

\m'{9)\ < B\9\m'{9)+C\e\m{e) < C\e\m{9), 
and this is the reason why B does not appear in the lower tail bound. This 
completes the proof of part {ii). For the moment inequalities in part {in), 
first observe that by equation @, we have 

mixf") = Imfixf"-' - f{xf''-')Fix,x')). 

By the inequality 

|^2fc-l _ y2fc-l| < ^^^(^2fc-2 ^ y2/c-2^|^ _ y| 

which follows easily from a convexity argument very similar to ((7|, we have 
EifiXf'') < {2k - 1)E(/(X)2^-2^(X)) 
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By Holder's inequality, we get 

EifiXf") < (2A:- l)(E(/(X)2'=))('=-i)A-(IE(A(X)^))i/'=. 

The proof is completed by transferring K{f{X)'^^Y''^^''^ to the other side. 

D 
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